20 research outputs found
Improving the Cross-Lingual Generalisation in Visual Question Answering
While several benefits were realized for multilingual vision-language
pretrained models, recent benchmarks across various tasks and languages showed
poor cross-lingual generalisation when multilingually pre-trained
vision-language models are applied to non-English data, with a large gap
between (supervised) English performance and (zero-shot) cross-lingual
transfer. In this work, we explore the poor performance of these models on a
zero-shot cross-lingual visual question answering (VQA) task, where models are
fine-tuned on English visual-question data and evaluated on 7 typologically
diverse languages. We improve cross-lingual transfer with three strategies: (1)
we introduce a linguistic prior objective to augment the cross-entropy loss
with a similarity-based loss to guide the model during training, (2) we learn a
task-specific subnetwork that improves cross-lingual generalisation and reduces
variance without model modification, (3) we augment training examples using
synthetic code-mixing to promote alignment of embeddings between source and
target languages. Our experiments on xGQA using the pretrained multilingual
multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of
the proposed fine-tuning strategy for 7 languages, outperforming existing
transfer methods with sparse models. Code and data to reproduce our findings
are publicly available
Improving the Cross-Lingual Generalisation in Visual Question Answering
While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, (3) we augment training examples using synthetic code-mixing to promote alignment of embeddings between source and target languages. Our experiments on xGQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models. Code and data to reproduce our findings are publicly available
Zero-Shot Cross-Lingual Transfer with Meta Learning
Learning what to share between tasks has been a topic of great importance
recently, as strategic sharing of knowledge has been shown to improve
downstream task performance. This is particularly important for multilingual
applications, as most languages in the world are under-resourced. Here, we
consider the setting of training models on multiple different languages at the
same time, when little or no data is available for languages other than
English. We show that this challenging setup can be approached using
meta-learning, where, in addition to training a source language model, another
model learns to select which training instances are the most beneficial to the
first. We experiment using standard supervised, zero-shot cross-lingual, as
well as few-shot cross-lingual settings for different natural language
understanding tasks (natural language inference, question answering). Our
extensive experimental setup demonstrates the consistent effectiveness of
meta-learning for a total of 15 languages. We improve upon the state-of-the-art
for zero-shot and few-shot NLI (on MultiNLI and XNLI) and QA (on the MLQA
dataset). A comprehensive error analysis indicates that the correlation of
typological features between languages can partly explain when parameter
sharing learned via meta-learning is beneficial.Comment: Accepted as long paper in EMNLP2020 main conferenc
Boosting Radiology Report Generation by Infusing Comparison Prior
Recent transformer-based models have made significant strides in generating
radiology reports from chest X-ray images. However, a prominent challenge
remains: these models often lack prior knowledge, resulting in the generation
of synthetic reports that mistakenly reference non-existent prior exams. This
discrepancy can be attributed to a knowledge gap between radiologists and the
generation models. While radiologists possess patient-specific prior
information, the models solely receive X-ray images at a specific time point.
To tackle this issue, we propose a novel approach that leverages a rule-based
labeler to extract comparison prior information from radiology reports. This
extracted comparison prior is then seamlessly integrated into state-of-the-art
transformer-based models, enabling them to produce more realistic and
comprehensive reports. Our method is evaluated on English report datasets, such
as IU X-ray and MIMIC-CXR. The results demonstrate that our approach surpasses
baseline models in terms of natural language generation metrics. Notably, our
model generates reports that are free from false references to non-existent
prior exams, setting it apart from previous models. By addressing this
limitation, our approach represents a significant step towards bridging the gap
between radiologists and generation models in the domain of medical report
generation.Comment: Accepted at ACL 2023, BioNLP Worksho
ProVoc : une ontologie pour décrire des produits sur le Web
National audienceDe nombreuses recherches ont depuis longtemps motivé l'utilisation d'ontologies pour répondre aux besoins de représentation du e-Commerce. Dans cet article, nous présentons ProVoc (Product Vocabulary), une ontologie ayant pour objectif de décrire des produits sur le Web. Complémentaire à GoodRelations (Hepp, 2008), l'ontologie au format du Web sémantique la plus utilisée dans le monde du e-Commerce, Provoc se concentre sur une représentation fine des produits et de leurs entités relatives (gammes des produits, composition des produits, etc.). L'utilisation conjointe des deux ontologies permet d'élargir l'espace des requêtes de l'utilisateur. Par exemple : « Quels sont les produits qui contiennent des ingrédients néfastes pour la santé ? Qui les vend ? ». Nous montrons par le biais de requêtes SPARQL que nos scénarios trouvent une formulation adéquate et une représentation pertinente avec ProVoc. Enfin, une application de veille stratégique dans le domaine de la cosmétique est présentée
Vision transformer assisting rheumatologists in screening for capillaroscopy changes in systemic sclerosis: an artificial intelligence model.
OBJECTIVES
The first objective of this study was to implement and assess the performance and reliability of a vision transformer (ViT)-based deep-learning model, an 'off-the-shelf' artificial intelligence solution, for identifying distinct signs of microangiopathy in nailfold capilloroscopy (NFC) images of patients with SSc. The second objective was to compare the ViT's analysis performance with that of practising rheumatologists.
METHODS
NFC images of patients prospectively enrolled in our European Scleroderma Trials and Research group (EUSTAR) and Very Early Diagnosis of Systemic Sclerosis (VEDOSS) local registries were used. The primary outcome investigated was the ViT's classification performance for identifying disease-associated changes (enlarged capillaries, giant capillaries, capillary loss, microhaemorrhages) and the presence of the scleroderma pattern in these images using a cross-fold validation setting. The secondary outcome involved a comparison of the ViT's performance vs that of rheumatologists on a reliability set, consisting of a subset of 464 NFC images with majority vote-derived ground-truth labels.
RESULTS
We analysed 17 126 NFC images derived from 234 EUSTAR and 55 VEDOSS patients. The ViT had good performance in identifying the various microangiopathic changes in capillaries by NFC [area under the curve (AUC) from 81.8% to 84.5%]. In the reliability set, the rheumatologists reached a higher average accuracy, as well as a better trade-off between sensitivity and specificity compared with the ViT. However, the annotators' performance was variable, and one out of four rheumatologists showed equal or lower classification measures compared with the ViT.
CONCLUSIONS
The ViT is a modern, well-performing and readily available tool for assessing patterns of microangiopathy on NFC images, and it may assist rheumatologists in generating consistent and high-quality NFC reports; however, the final diagnosis of a scleroderma pattern in any individual case needs the judgement of an experienced observer
Vision transformer assisting rheumatologists in screening for capillaroscopy changes in systemic sclerosis: an artificial intelligence model
OBJECTIVES: The first objective of this study was to implement and assess the performance and reliability of a vision transformer (ViT)-based deep-learning model, an 'off-the-shelf' artificial intelligence solution, for identifying distinct signs of microangiopathy in nailfold capilloroscopy (NFC) images of patients with SSc. The second objective was to compare the ViT's analysis performance with that of practising rheumatologists.
METHODS: NFC images of patients prospectively enrolled in our European Scleroderma Trials and Research group (EUSTAR) and Very Early Diagnosis of Systemic Sclerosis (VEDOSS) local registries were used. The primary outcome investigated was the ViT's classification performance for identifying disease-associated changes (enlarged capillaries, giant capillaries, capillary loss, microhaemorrhages) and the presence of the scleroderma pattern in these images using a cross-fold validation setting. The secondary outcome involved a comparison of the ViT's performance vs that of rheumatologists on a reliability set, consisting of a subset of 464 NFC images with majority vote-derived ground-truth labels.
RESULTS: We analysed 17 126 NFC images derived from 234 EUSTAR and 55 VEDOSS patients. The ViT had good performance in identifying the various microangiopathic changes in capillaries by NFC [area under the curve (AUC) from 81.8% to 84.5%]. In the reliability set, the rheumatologists reached a higher average accuracy, as well as a better trade-off between sensitivity and specificity compared with the ViT. However, the annotators' performance was variable, and one out of four rheumatologists showed equal or lower classification measures compared with the ViT.
CONCLUSIONS: The ViT is a modern, well-performing and readily available tool for assessing patterns of microangiopathy on NFC images, and it may assist rheumatologists in generating consistent and high-quality NFC reports; however, the final diagnosis of a scleroderma pattern in any individual case needs the judgement of an experienced observer